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BAYESIAN  INTERPOLATION  AND  DECONVOLUTION 


C.  LARRY  BRET  Til  OR  ST 
Washington  University 
Department  of  (’In  inistry 
Campus  Box  I  Elf 
One  Bwokings  Drive 
St.  Ixtuis,  Missouri  6(1130- 4 899 

ABSTRACT.  The  deconvolution  problem  is  addressed  in  stages  beginning  with  the  interpolation 
problem  when  little  prior  information  is  available  and  proceeding  to  the  full  deconvolution  problem 
when  a  great  deal  of  prior  information  is  available.  The  results  of  the  calculations  indicate  that  good 
solutions  to  the  deconvolution  problem  arc  available  even  when  limited  prior  information  is  available 
and  that  these  results  overlap  those  obtained  when  a  great  deal  of  prior  information  is  available. 
The  difference  between  them  is  that  the  use  of  uninformative  priors  causes  large  uncertainties  in 
the  estimated  signal,  while  highly  informative  priors  decreases  the  uncertainties  in  the  estimated 
signal. 


Introduction 

The  deconvolution  problem  is  important  in  many  branches  of  science  and  engineering.  In  this 
problem  the  “image”  or  signal  is  convolved  with  a  smearing  function.  This  function  is  also  called 
an  impulse  response  function  because  the  ideal  noiseless  signal  that  one  would  obtain  in  response 
to  nn  input  impulse  or  delta  function  is  the  smearing  function  for  detector.  In  linear  systems 
the  output  from  an  arbitrary  input  may  be  written  as  a  convolution  or  average  of  the  true  signal 
convolved  with  the  impulse  response  function.  Averaging  loses  information.  In  addition  the  signal 
is  contaminated  with  noise,  consequently  there  is  no  unique  way  to  deconvolve  the  signal  front  the 
impulse  response  function:  rather  one  must  make  inferences  about  the  true  signal.  In  this  paper, 
the  deconvolution  problem  is  studied  beginning  with  the  simplest  “baby”  version  of  this  problem 
and  proceeding  through  stages  to  more  and  more  complex  versions  of  the  problem  until,  finally,  the 
full  deconvolution  problem  is  analyzed.  At  the  end  of  each  stage,  numerical  examples  are  supplied 
to  illustrate  the  calculations. 

In  the  deconvolution  problem  addressed  here,  there  is  a  data  set.  D  which  is  postulated  to 
contain  a  signal  y(t)  plus  additive  noise: 


<L(t,)  =  i/(f,)  f  n,  (1) 

where  //,  represents  the  noise.  The  data  D  are  a  collection  of  A'  discrete  data  samples.  D 
W(f|  ) . </(  t  v)}-  file  signal  g(t)  is  obtained  from  a  “convolution”  integral  of  the  form 

rt\ 

!){>)  =  I  >>rr{t  -t)u(t)  I  2) 

w  here  r(  / 1  is  I  he  impulse  response  function,  and  ml  I  is  the  unknown  signal.  1  he  data  I)  h.-.v-  bei  a 

w  I  It  t  I’ll  a  one  <  I  i  meilsit  1 1 1  a  I .  alt  lloilgll  the  Ilia  t  lie  III  a  I  |i  will  take  llo  riot  ire  of  this  ami  t  ‘ .  ...  ,,  a 
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may  be  generalized  to  higher  <li mansions  l>y  simply  relabeling  the  higher  dimensional  quantities. 
Hie  signal  that  appears  in  the  detector.  y(t),  will  he  thought  of  as  a  time  series,  although  again 
t  lie  mat  hematics  takes  no  notice  of  this,  and  one  could,  for  example,  interpret  t  as  position,  ;is  one 
would  in  an  image.  The  problem  is  to  make  the  best  inference  possible  for  the  unknown  signal. 
u(l).  from  the  data  and  the  prior  information. 

When  the  impulse  response  function  r(r)  is  a  Dirac  delta  function 

r(t  -  t)  -  t>(t  -  r),  (d) 

the  convolution  integral  may  he  evaluated  and  one  obtains 

d(tx)  —  +  n,.  (4) 

The  deconvolution  problem  has  reduced  to  the  “data  interpolation”  problem.  Clearly  if  one  is  to 
understand  the  deconvolution  problem,  then  one  must  have  a  firm  understanding  of  the  interpolation 
problem.  Tor  this  reason  the  data  interpolation  problem  will  he  studied  in  the  first  two  sections  of 
this  paper. 

In  the  first  section,  the  interpolation  problem  is  addressed,  and  probability  theory  will  be  used 
to  derive  the  posterior  probability  for  the  value  of  an  arbitrary  pixel  given  the  data  and  the  prior 
information.  In  this  babv  version  of  the  problem  the  prior  information  will  be  that  the  signal  should 
be  smooth. 

In  the  second  section,  the  analysis  of  the  interpolation  problem  continues  with  the  use  of  more 
informative  prior  information.  This  more  informative  prior  information  will  include  information 
about  the  functional  form  of  the  signal,  as  well  as  information  about  the  first  and  second  derivatives. 
At  the  md  of  each  sections  several  numerical  examples  are  given. 

In  the  third  section,  the  full  deconvolution  problem  is  addressed  using  the  techniques  and 
procedures  developed  in  the  first  two  sections.  Again  numerical  examples  are  included  at  the  end 
of  this  section.  Then  in  the  fourth  section  the  deconvolution  is  generalized  to  include  more  general 
types  of  prior  information.  Additionally,  more  efficient  means  of  estimating  the  signal  and  the 
uncertainty  in  the  estimate  are  developed. 

1  Data  Interpolation  -  Second  Derivative  Prior  Information 

In  the  data  interpolation  problem,  there  is  a  signal  U.  This  signal  is  to  be  estimated  at  a  num 
her  of  discrete  points.  These  discrete  points  will  be  called  pixels.  Those  pixels  will  be  labeled 
{u0 - }  where 

v  =  ti{  N  ! )  +  1 ,  (5) 

is  the  number  of  the  pixel  corresponding  to  the  last  data  value,  and  pixel  M|  corresponds  to  the  first 
data  value.  The  pixels  labeled  i/|,...,h„  will  be  called  interior  pixels;  while  nn,  and  u,/+i  will  be 
called  boundary  pixels.  These  boundary  pixels  are  special  because  they  must  be  handled  differently. 
The  pixel  density  factor,  3.  indicates  the  density  of  the  pixels  relative  to  the  data.  If  3  —  1  there  is 
a  one  to  one  correspondence  between  the  pixels  and  the  data  (excluding  the  two  boundary  pixels). 
If  i  ~  there  are  two  pixels  for  every  data  value,  etc.  The  discrete  times  I,  correspond  to  the 

pixels,  not  the  data.  So  the  sampling  times  for  the  data  are  given  bv  { D .  t 1 1 .  tj/i+i . T,},  and 

the  data  elements  will  also  be  labeled  to  correspond  to  the  pixels:  {d\  ,(/,/<  i .  il-zi r  4 1 .....  d,, }.  The 
collection  of  all  of  the  data  will  be  labeled  as  I).  while  the  collection  of  all  of  the  pixel  will  be 
labeled  as  f  . 


1 


Hie  data  I)  consists  of  values  of  the  signal  /  '  plus  noise: 


</,  —  u,  +  n,  i  •-  \  1 ,  ;i  +■  1 , 2/i  ■+- 


(6) 


where  u,  is  the  value  ol  a  randomly  varying  component  that  one  lias  no  way  to  predict.  I  he  problem 
is  to  make  the  best  estimate  of  any  one  of  the  pixels  possible.  Because  we  will  estimate  an  arbitrary 
pi xel  iij ,  we  will  have  estimated  all  of  them  by  letting  j  take  on  any  value  {0  <  j  <  v  +  1 }.  Prom 
the  standpoint  of  probability  theory,  all  of  the  information  relevant  to  this  inference  is  contained  in 
a  probability  density  function:  P(u,\D ,  I ),  the  probability  that  the  signal  has  value  Uj,  given  the 
data  and  the  prior  information  /.  This  probability  is  computed  using  the  sum  rule 

P(ti>|D,/)=  ( ydux-P(U\DJ),  (7) 


where  P(U\D,l)  is  the  joint  probability  for  all  of  the  pixel  values.  The  integrals  are  over  all  pixel 
values,  except 

Bayes  theorem  [lj  may  be  used  to  factor  P(U\D,I)  to  obtain 


P(uj\D,I)  =  J ■  •  •  duj  •  •  - 


P(U\I)P(D\U,  /) 
P(D\I) 


(8) 


where  P(U\l)  is  the  joint  prior  probability  for  all  the  pixel  values,  P{D\U,I )  is  the  probability  for 
the  data  given  the  pixel  values,  and  P(D\I)  is  a  normalization  constant. 

Making  the  standard  assumptions  about  the  noise,  the  probability  for  the  data  given  V  is  just 
the  likelihood  function 


P(D\a,U,J)  =  (2*<72r£oxpj J^(d,  -  u,)2l 

by‘/> 


(9) 


where  the  standard  deviation  of  the  noise,  <r,  has  been  .added  to  the  direct  probability  for  the  data 
in  a  wav  that  indicates  its  value  is  known.  Later,  the  rules  of  probability  theory  will  be  applied  to 
remove  o  from  the  problem,  if  its  actual  value  is  unknown.  The  index  i  [on  the  sum  in  1-(|.  f 0 )] , 
means  that  i  starts  at  1  and  goes  to  v  in  step.-,  of  /?.  Substituting  the  direct  probability  into  the 
posterior  probability,  Kq.  (3),  and  assuming  normalization  will  occur  at  the  end  of  the  calculation, 
one  obtains 


P{U)\cr,  D ,  I)  oc  J  ■  •  ■  <fu,  •  •  P(U\I)<j  exp 


I 

2(7  2 


^(d,  -  U, 

i=l 


by/* 


(10) 


The  problem  has  been  reduced  to  specifying  the  prior  probability,  P(U\l). 

If  one  were  to  ignore  the  prior,  as  one  would  using  maximum  likelihood,  then  all  of  the  pixels  values 
associated  with  the  data  values  are  estimated  to  be  equal  to  the  data,  uj  —  dj ,  while  all  of  the  interpolation 
pixels  are  estimated  to  be  zero.  This  is  the  maximum  likelihood  or  least  squares  solution  to  this  problem. 
But  probability  theory  automatically  tells  one  this  is  not  correct.  This  weighted  average  will  be  very 
difficult  from  the  maximum  likelihood  solution.  And  this  difference  is  maintained  even  in  the  limn  of 
very  uninformative  prior  information. 

For  any  given  problem  there  could  be  a  great  deal  of  prior  information  available.  For  example,  if 
the  data  were  the  output  from  a  continuous  wave  radar,  then  the  signal  will  look  highly  :  inusoidal; 


yet  significant  deviations  will  occur  near  the  beginning  and  ending  of  the  signal.  If  the  radar  were  i 
pulsed  radar,  the  signal  would,  at  least  superficially,  be  like  the  derivative  of  a  Gaussian.  Again  there 
could  be  significant  deviations.  "This  information  is  qualitatively  different  from  that  normally  associated 
with  a  model,  where  the  prior  information  insists  that  the  signal  must  be  of  a  certain  functional  form  and 
any  deviations  from  it  are  to  be  considered  noise.  Here  the  signal  should  be  allowed  to  make  deviations 
from  the  functional  forms  when  the  data  shows  evidence  for  such  deviations.  This  type  of  prior 
information  will  be  called  "soft"  because  we  do  not  insist  that  the  signal  have  this  functional  form. 

In  addition  to  this  soft  prior  information  about  the  functional  form  of  the  signal,  one  might  know 
some  general  characteristics  about  the  signal.  For  example,  the  signal  might  be  generated  =bv  some 
analogue  electronics.  Electronics  never  generates  perfectly  sharp  signals;  it  always  averages  things  out 
Couid  smoothness  be  used  as  "soft"  prior  information? 

The  answer  to  this  question  is  yes!  It  is  possible  to  include  both  types  of  "soft"  prior  information 
the  calculation.  Probability  theory  can  be  told  that  the  signal  is  more  or  less  sinusoidal,  without 
insisting  that  it  be  sinusoidal,  just  as  it  will  be  possible  to  tell  probability  theory  that  the  signal  should  be 
smooth  with  insisting  that  the  signal  must  be  smooth.  To  see  how  to  do  this,  the  interpolation  problem 
wiil  be  investigated  using  both  of  these  types  of  "soft"  prior  information.  We  begin  by  including  pri.  r 
information  about  the  "smoothness"  of  the  signal,  and  then  in  the  next  section  proceed  to  include  \or; 
inrormation  about  die  functional  form  of  the  signal. 


1.1  Constraining  The  Second  Derivative 


In  the  traditional  interpolation  problem,  the  data  is  assumed  noiseless  and  one  is  trying  to  inter¬ 
polate  between  data  values.  The  criteria  used  in  splines  is  typically  minimum  arc  length,  and  one 
seeks  the  shortest  interpolation  function.  Here  noise  is  allowed  into  the  problem.  This  noise  could 
be  either  positive  or  negative  and  its  effect  is  to  make  the  data  “jitter”  ..round  the  “signal”  in 
a  random  wav.  This  jitter  should  be  suppressed  as  much  as  possible.  Mathematically  this  jitter 
corresponds  to  a  rapidly  varying  second  derivative.  It  can  be  suppressed  if  the  second  derivative  of 
the  signal  can.  in  some  sense,  be  made  “small.” 

The  data  are  sampled  at  discrete  times.  The  lirst  and  second  derivatives  are  not  defined  for 
discrete  functions.  However,  one  can  define  analogous  quantities  which  reduce  to  the  first  and 
second  derivative  as  the  sampling  density  goes  to  infinity.  The  first  derivative  of  a  continuous 
function  mav  be  defined  as 


df(t)  f(l  f  A)  -  /(<  -  A) 

— . —  =  lim  - - - 

dt  A-o  2A 


(in 


For  a  discretely  sampled  function  this  becomes 


df(tt)  f(t,+  A)  -  /((,  -  A) 

dt,  2A  (U 

where  fit,  -f  A)  =  f(t,+\)  is  the  function  at  the  forward  sampling  time.  f(t ,  -  A)  -=  /(/,_])  is  the 
function  at  the  backward  sampling  time,  and 


A  _  f,+ 1  t ,  -  t,  —  f|_i 


( i  .n 


is  the  sampling  time.  It  is  clear  from  this  definition  that  the  discrete  first  derivative  is  onlv  an 
approximation.  This  approximation  is  accurate  in  order  A.  So  if  delta  is  0.01.  i.e.,  if  data  were 
colh  <  ' ed  .'very  0.01  seconds,  then  the  discrete  first  derivative  will  be  accurate  t < >  fO.Ol. 


1  11.'  second  derivative  is  just  a  derivative  ol  a  derivative  and  is  defined  as 


dt 2 


f(<  +  2A)  -  f(t)  _  f(i)~  f(t-  -2A) 

litn  - ^ ^ - 

A  -o  2A 


(14) 


This  can  be  rewritten  as 

d2f{t)  _  f(t  +  2 A)  f  /((-  2A)-2/(f) 
dt2  a"o  4A2 

The  corresponding  equation  for  a  discretely  sampled  signal  is  given  by 

d2f(u)  )  +  /(<«-!  )-2f(tj) 

dt'2  A2 

Note  that  this  approximation  is  accurate  to  order  A2,  so  if  A  is  small,  second  derivatives  my  be 
evaluated  very  precisely,  provided  sufficient  machine  accuracy  is  available. 

Now  that  we  have  a  definition  of  the  discrete  second  derivative,  the  prior  information,  that  it 
must  be  “small”  must  be  translated  into  a  prior  probability  P(U\I).  But  the  second  derivative  can 
be  positive  or  negative.  Additionally,  the  second  derivative  is  defined  at  every  data  point,  so  what 
is  meant  bv  “small”?  Here  “small’  will  mean  that  the  mean-square  value  of  the  second  difference 
should  be  small: 

V 

53  +  nJ-l  '  2n)\2  ~  |S2’  0”) 

J  =  l 

where  b2  is  the  total  second  difference.  This  equation  will  be  referred  to  as  a  constraint  on  the 
second  derivative  for  reasons  that  will  become  apparent  shortly.  The  quantity  6,  is  a  measure  of 
the  “smallness”  of  the  second  derivative.  When  b  is  large,  large  jitter  is  allowed  and  the  signal 
will  be  estimated  to  be  the  data  values.  When  b  — >  0,  no  jitter  is  allowed,  and  the  signal  will  be 
estimated  to  be  constant.  Somewhere  between  these  extreme  values  is  one  which  will  suppress  the 
jitter  without  suppressing  the  signal. 

Note  that  this  constraint  introduces  other  parameters  into  the  problem.  If  for  example  ,3  = 
1,  the  constraint  introduces  three  new  parameters:  two  “boundary”  pixels,  no  and  u^+i,  and  a 
regularization  parameter  which  will  be  called  (  and  is  related  to  b2.  If  (3  >  1,  the  constraint  also 
introduces  the  “interpolation”  pixels  into  the  problem. 

The  process  of  converting  Kq.  (17)  into  a  prior  probability  density  function  is  a  straightforward 
application  of  the  principle  of  maximum  entropy  and  results  in  the  assignment  of  a  Gaussian  prior 
probability: 


(15) 


(16) 


/’(til, .  •  •  ,  M„+1,«,<T,/)  oc  exp  <  -^2  53  fu,+>  +  ,i,_l  ~  '2u«32  |  > 


(IS) 


where  (2/n2  is  the  Lagrangue  multiplier  from  the  maximum  entropy  calculation.  The  fractional 
variance  < 2  will  be  used  to  control  the  amount  of  smoothing  and  is  related  to  the  mean-square 
second  derivative. 

I  hree  additional  parameters:  i/()  and  u„+|,  the  boundary  pixels,  and  the  fractional  variance  > 
have  entered  the  problem.  These  parameters  were  added  to  the  prior  in  a  wav  that  indicates  that 
their  values  are  given.  Of  course  in  a  real  problem  their  values  will  not  be  known  and  inferences 
must  be  made  about  them.  All  three  of  these  parameters  are  nuisances  in  the  sense  that  one  would 
like  to  formulate  the  problem  independent  of  their  value.  This  mav  be  done  readih  foi  and 

'i  i,  M  :  but  i  will  prove  in  be  harder  to  deal  with. 


What  we  have  derived  so  far  is  the  prior  probability  for  the  interior  pixels  given  the  boundary 
pixels.  What  is  needed  is  the  prior  probability  for  all  of  the  pixels.  To  compute  this  the  joint  prior 
for  ail  of  the  pixels  is  factored  using  the  product  rule  to  obtain: 


P  (no, - Uj/+i|i  .a!)  -  Pi  nn,  «„+)  |t,  a.  /)P(u) , . . . ,  ut,|un,  u„+i,<,<r,  I)  (ID) 


where  P(u o, - u„+ijc,fr/)  is  the  joint  prior  for  the  interior  and  boundary  pixels;  the  joint  prob¬ 

ability  for  the  interior  pixels  given  the  boundary  pixels,  P(u\ , . . . ,  u„|uo,  u^+j,  ♦  , a. /),  is  given  bv 
Eq.(  IK)  and  P(uo,U|,+  i|< .  <?.  1 )  is  the  prior  probability  for  the  boundary  pixels. 

To  assign  the  prior  probability  for  these  two  boundary  pixels,  /’( uo,  u,,+  |  |r ,  <r,  / ),  a  different 
interpretation  of  the  second  derivative  will  be  used.  Suppose  it  is  known  that  adjacent  pixels 
should  be  approximately  equal: 


«i+i  +  u.-i 


(20) 


This  may  be  rewritten  as 


«I+I  +  M.-i  -  2 u,  ^  0. 


(21) 


But  this  is  essentially  just  the  statement  that  the  second  derivative  should  be  small.  So  constrain¬ 
ing  the  second  derivative  to  be  small  is  equivalent  to  asserting  that  neighboring  pixels  should  be 
approximately  equal.  On  the  boundary  this  could  he  interpreted  as 


uo  ~  W)  and  uvvuv+\. 

Converting  this  prior  information  into  a  prior  probability  for  uo,  one  obtains 


(22) 


P(uq|«,<t,/)  oc  exp  J  j(uo  -  u,)2  l  , 


and  similarly  for  u^+i 


I  <2 

P(u„+,|<,<r,/)  oc  exp  (  -  — j  (u„+i  -  u„)  J>  . 


(23) 


(24) 


To  combine  these  priors,  one  uses  the  product  rule  to  factor  P(«o,u,,+||<,ct,  /),  and  assuming 
independence  one  obtains: 


P(«o,  u„+i|«,<t. /)  =  /’(uo|c.^.  I)P{u„+]\i  ,a.  I). 

Substituting  for  P(uo|t,a.  /)  and  P(u„+i|<,<7,  /),  one  obtains 

!»2  < 2  ' 
P(«0,  ur+!|r.cr.  /)  oc  exp  ^  (un  -  111]2  -  —  [u„+i  -  u„]2 


(2  6) 


(26) 


as  the  joint  prior  probability  for  'he  boundary  pixels.  Substituting  the  joint  prior  for  the  boundary 
pixels.  Eq.  (26),  and  the  prior  for  the  interior  pixels,  Eq.  (18),  into  the  prior  probability  for  all  of 
the  pixels  including  the  boundary  pixels,  one  obtains 


P(uo . ur+i|<,ir, /)  nc  exp  ^  — —  [un  -  wj]2  -  [«,.+  |  -  - 


X  ex  / 


J  (J  *•  «. 


HI  *  -  I  \ 


S 


I  Ins  |mnr  <  an  be  p-wrii  ten  .is 


/’(  . . 1 1 <  ,rr,I)  -  [An  ■  ■  ■  A„+1)-' 


1  i2ku- 


r  H  I'  4-1 


i-.is) 


/—(I 


wIhti'  (An . A,,+  i}  .iri'  tin'  eigenvalues  of  the  matrix  Ru  defined  as 


/  2 
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(29) 


Note  that  in  writing  the  prior  in  this  form,  it  has  been  implicitly  assumed  that  the  Ru  matrix 
is  not  singular.  A.s  this  prior  is  written,  this  is  not  the  case!  The  Ru  matrix  has  one  singular 
eigenvalue.  Apparently  one  of  the  two  boundary  conditions  was  redundant.  This  problem  must  be 
resolved  before  any  numerical  calculations  can  be  done.  The  condition  that  the  boundary  pixels 
should  be  approximately  equal  to  the  interior  pixels  at  the  boundary  be  maintained.  This  can  be 
done  by  making  a  slight  change  in  the  boundary  conditions: 


tin  ~  0.099  m  | 


and 


H/V  ~  1.00  lu,,  f|. 


(30) 


Making  this  slight  change  removes  the  singular  eigenvalue  and  allows  the  prior  to  be  normalized, 
without  changing  the  spirit  of  the  boundary  condition. 


1.2  Eliminating  Nuisance?  Parameters 

Now  that  the  prior  has  been  specified,  it  may  be  substituted  into  the  posterior  probability  for  pixel 
n j.  I  t|.  (  ID),  to  obtain 


/’( tij|i ,  n,  I),  I ) 


k 


(lu,  ■  [An  •  ■  •  A,,4|]2  cr 
(  f  2  eft  e-f  1 

*  ,1'11 


1  -(A-cetJ)  v+'i 


k~ ill  /=U 

V 


i -i 

bv  ;< 


I  lie  integrals  are  over  all  pixels,  except  i  =  j.  There  are  n  -f  1  integrals  to  evaluate, 
lo  evaluate  these  integrals,  the  exponent  in  the  likelihood  is  squared  to  obtain: 


/’(  n,|< .  o,  1 1,  I ) 


k 


(lu,  ■  ■  [  A0  •  •  A,,4  j  ) 2 


2  (T 


(1'4.%'tn 


<f) 


y  exp, 


e  eft  i  H 

an/-’  ->y  ,i,u,  t 

'ik  i 11 1  “  i 

J  I  1  I !  /  e 

I.V  .1 


(31) 


f.TT. 


w !:•■[•'  i’-  is  the  mean-square  dal. a  valuo.  defined  a. 


Ji  ,  J,  \ ' 

v  ft 

liy  1 1 


and  the  interaction  matrix  <ju  is  defined  as 


<IU  ~  tRkl  +  Ski  l)  <  kj  <  v  +  1. 


The  matrix  .S’jtl  is  diagonal  and  defined  as 


1  If  k  =  l  and  mod(A:  —  l,  ft)  =  0 


0  otherwise. 


where  "mod(A:  —  l,/3)  =  0”  means  that  (k  —  1)  is  evenly  divisible  by  ft. 

There  is  no  integral  over  Uj ,  consequently  u}  behaves  like  a  constant.  Separating  u,  from  the 
integration  variables  one  has 

P{uj\t,aJ)J)  rx  J  •  •  • duj ■■■  [A„  •  •  • 


X  exp 


Nd2  -  2d}UjZ  T  gjjU2}  ] 
2a'2  ( 


{j  ,  i/+l  i/4-l  v  a 

”^2  gklUkUl  ~  2  Zj/  “  !)\}nj\u*  | 

I?]  ,,=l~ 


where 

f  1  if  j  =  11.  ft+  \,2ft  +  1 . t/} 

'  =  <  (»7) 

I  0  otherwise. 

Now  that  the  dependence  on  uj  has  been  separated  from  the  integration  variables,  the  integrals 
may  be  done  by  the  following  change  of  variables: 


e+i 

U  =  yjx t  (k  ^  j), 


where  the  un  are  given  by 

«/+l  /  . 

«fc  =  (k/j),  (39) 

t;t;  V 

and  ,\J  i.-  the  ith  eigenvalue  of  the  jth  rofactor  of  the  <y,jt  matrix,  Kq.  (•(1).  and  r,n  is  the  A’th 

component  of  its  ith  eigenvector.  As  a  reminder,  the  jth  rofactor  of  a  square  matrix  ot  rank  r  t  2 

is  a  -quare  matrix  of  rank  r  1  I.  I  lie  cofactor  is  formed  bv  deleting  tlm  / 1  h  row  and  <olumii 

from  I  q  i  dll.  Note  that  in  defining  the  rofactor  matrix  the  indices  have  not  been  relabeled:  tliev 

,-tiil  run  iiom  zero  to  r  r  I:  tnc.vever.  the  ;i|i  item  m>  longer  exists  and  must  be  skipped  in  ail 


It! 


Mimm.ifioii.  I  his  will  In-  in  the  equations  wlieie  applicable.  I  hev  new  integration  variables 

have  the  property  that 

e+l 

V '  Olk^xk  =  K'  tl  (*. t  f  j)>  (40) 


Y€lke'k  =  (j,i  #  j) 

k=0 

k*] 


wherp  6/,  is  the  Kronecker  delta  function.  The  volume  element  of  the  transformation  is  given  by 

dAo  -  dAj^idAj+i  -  dAv+x  ,  j 

- ■  ■■  -- ""  =  —  ■  —  =  dun  •  •  •  duj..\du.+\  ■  ■  ■  dUt,+\.  (4^ 

/v  ...  v  v  ...  v 

\J*  o  Ae+1 

Making  the  change  of  "ariablcs  and  introducing  a  new  quantity  h[(uj): 


1 

hi(Uj)  =  — ^  (d«  “  fh)u)]cH  (l  #  j) 

,  A}  :=1 
V  1  lw  H 


one  obtains 


e,  /)  o<  (-  ™  1  >»f<  -  ,l(u') :  '±!l 


p|-^V  [a,  -  j 


where  the  square  on  the  quadratic  terms  was  completed,  some  factors  of  2tr  were  dropped,  the 
determinate  (which  is  a  constant  here)  was  also  dropped .  The  quantity  h(uj)  ■  h{uj )  is  defined  as 

e+l 

h(uj)  ■  h(u,)  =  ^ )2.  (45) 


Evaluating  the  i/  +  1  integrals  gives  a  factor  of  (‘27rrT‘!)(*/+1^,  and  one  obtains 


/-<«>, ,.e.  n  ,x  (46) 

as  t  he  posterior  probability  for  the  jth  pixel.  If  as  assumed  so  far,  the  variance  of  the  noise  and  the 
value  of  the  fractional  variance  i  are  actually  known,  then  there  are  a  number  of  additional  terms 
that  are  constants  and  these  constants  will  cancel  when  the  distribution  is  normalized.  Dropping 
these  terms,  one  obtains 


/’(  u;\n.( ,  I ),  l)  exp 


2dj<t,z  -  </j,ui  +  h(u,  )  •  li(  Uj) 


•  is  the  posterior  prohahili  t  v  lor  the  j\  h  pixel  given  t  lie  m  amlard  deviation  of  the  noise,  tie-  tract  innai 
'.anainr,  i  lie  data  l>,  .onl  the  prior  in lonnat ion  /. 


l.B  Kliininating  rr  As  A  Nuisance  Parameter 

In  most  real  problems  neither  a  nor  <  are  known;  they  are  nuisance  parameters  and  should  lie 
treated  .is  such.  This  is  easy  lor  rr,  but  <  is  more  difficult  to  deal  with.  To  make  inferences  about 
Uj  independent  of  <r  we  apply  the  sum  rule  to  obtain 

/’(uj|c,  D, /)  =  j d(rP(uj,a\f,  l), /).  ( IK) 

The  right-hand-side  of  this  equation  may  be  factored  to  obtain 

P(ujM'*DJ)  =  P(uj,a\<,I)P(D\u„<r,<,I) 

=  P(«j\I)P(a\I)P(D\uj,<r,tJ)  (49) 


=  P(cr\r)P(uj\D,a,(,I) 

where  it  was  assumed  that  the  prior  probability,  P(u},  <r\t,  /),  was  independent  of  f  and  that 
P(u},(t\I)  —  P(ui\I)P(a\I).  Inserting  this  result  into  Eq.  (48)  one  obtains 

P(uj\i,  D, /)  =  J d<rP(<r\I)P(uj\tT,t,  D,I)  ( -r»0 ) 

where  /’(<r|/)  is  the  prior  probability  for  the  variance,  and  P(U)\cr,i,  D ,  /)  is  proportional  to  Eq.  (  Hi). 

The  posterior  probability  for  u}  may  be  computed  provided  a  prior  is  assigned  to  the  noise 
standard  deviation.  Having  no  specific  information  about  cr,  a  Jeffreys  prior  I /a  [4]  is  assigned  to 
obtain: 


P(uy|c,  D, /)  oc  f  daa  N  exp  j-  — 2  ^Nd2  -  2djUjz  +  gJ}uj  -  h(u})  ■  h(u})  |. 


Evaluating  the  integral,  one  obtains 


P(u} jc,  DJ)  a  1 


/'(«>)  •  +  'ZdjUjZ  - 


This  is  a  Student  (  distribution,  and  it  is  this  result  that  is  applied  in  the  numerical  examples. 

Suppose  a  simple  experiment  has  been  run  for  100  seconds  and  a  data  item  was  gathered  every 
second,  thus  obtaining  N  =  100  data  samples.  Suppose  the  data  gathered  in  this  experiment 
looked  like  that  shown  in  Fig.  1  (a  constant  signal  of  value  5,  plus  Gaussian  white  noise  of  standard 
deviation  of  1).  In  the  calculation  so  far,  only  one  pixel  may  be  estimated  at  a  time.  But  any 
pix<4  may  be  estimated,  so  all  of  them  may  be  estimated.  In  this  numerical  example,  j  -  59  will 
be  used.  At.  the  end  of  the  example,  the  results  will  be  shown  for  all  of  the  pixels.  To  estimate 
U59  one  needs  only  to  apply  the  posterior  probability  for  the  pixels.  But  this  probability  density 
function  assumes  the  value  of  >  is  known  and  the  estimated  pixel  value  depends  on  what  value  of  c 
is  chosen.  Before  the  pixel  value  can  be  estimated,  a  procedure  must  be  developed  that  allows  one 
to  estimate  or  set  c  to  a  reasonable  value. 


1.4  Estimating  The  Itegularizer  r 

If  one  follows  the  rules  of  probability  t.lworv  exact  I  v,  the  way  to  proceed  is  to  multiply  the  pmbabilil  v 
for  the  pi;»c|  niven  the  value  of  < .  bv  a  pi  ior  probahiiil  v  for  »  and  integrate.  I'  utort  iiuatoj  v.  <  appeals 


Figure  1:  Interpolation  ’['lie  Data 


o 


Fir.  1.  I'he  tlata  contain  a  constant  signal  of  value  5,  plus  noise  of  standard  deviation  one.  The  problem  is 
lo  make  the  best  estimate  of  a  pixel  given  only  the  information  that  the  function  must  be  smooth  and  these 
data. 


in  the  problem  in  a  very  nonlinear  way  and  evaluating  the  integral  in  closed  form  has  not  proven 
possible.  However,  there  are  approximations  which  will  allow  one  to  proceed  and  obtain  results 
that  are  nearly  identical  to  the  exact  procedure.  If  the  joint  posterior  probability  for  pixel  U59 
and  (  is  sharply  peaked,  then  removing  the  regularizer  by  integration,  essentially  just  constrains 
the  regularizer  to  its  value  at  the  maximum  of  the  joint  posterior  probability.  If  the  value  of 
the  regularizer  near  the  maximum  ran  be  determined,  then  r  can  be  constrained  to  this  value 
in  Eq.  (52).  The  results  obtained  will  be  nearly  identical  to  what  would  have  been  obtained  by 
removing  <  by  integration  [7], 

To  determine  a  reasonable  value  of  c,  the  probability  density  for  the  regularizer  will  be  computed. 
From  this  probability  density  function  one  can  locate  the  value  of  >  for  which  the  posterior  is 
maximized.  This  maximum  may  be  used  in  Eq.  (52)  to  obtain  ihe  posterior  probability  for  the 
pixels.  I’lie  estimated  pixel  value  are  dependent  on  the  value  of  t,  r.o  it  is  important  that  a  value 
near  tin'  most  probable  value  be  used  when  estimating  the  pixels. 

To  illustrate  that  a  good  estimate  of  <  is  necessary,  consider  Fig.  2.  Here  two  different  values 
of  (  were  used:  one  small  and  one  large.  In  panel  2(A),  (  =  0.01.  The  data  values  are  shown  as 
open  circles,  and  the  reconstruction  is  shown  as  the  solid  line.  The  pixel  estimates  plotted  are  the 
mean  or  expected  values  of  the  pixels.  These  were  computed  using  the  procedures  developed  in 
Section  d.l.  For  now  it  is  enough  to  know  that  the  values  are  just  the  ones  given  by  the  maximum 
of  tin'  posterior  probability  for  the  pixels,  given  the  value  of  c,  Eq.  (52).  For  small  c,  the  prior 
information  is  essentially  irrelevant,  and  the  pixels  are  estimated  to  be  equal  to  the  data  values. 
This  effect  is  seen  in  panel  21  A),  where  the  reronst ruction  follows  I  he  dat  a  almost  exactly.  The 
opposite  elici  t  occurs  when  <  00.  Here  the  prior  is  important  and  the  data  are  irrelevant,  and 

the  pixels  are  estimated  li>  be  a  constant,  zero.  Somewhere  between  ihcse  two  extreme  values  is 
region  which  1  .ippiopii.il'-  (or  this  problem. 
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Figure  2:  The  Estimated  Pixels  As  A  Function  Of  • 
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Fig.  2  Panel  (A)  contains  the  data  (open  circles)  and  the  estimated  pixel  values  (solid  line)  for  f  =  0.01. 
Here  <  is  too  small  and  the  reconstruction  pays  too  much  attention  to  the  data.  In  panel  (B),  <  =  1,000.000, 
and  is  too  large;  the  estimated  pixels  (solid  line)  does  not  pay  enough  attention  to  the  data  (open  circles). 
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Id  lind  this  region.  one  can  compute  the  posterior  probability  for  <.  Using  the  sum  rnl*-  from 
probability  theory,  this  is  given  l>y 


l>{(\DJ)  ~  j du0- ••duv+\dali[<1tT,v.n . uu+i\DJ).  (.r>3) 

The  integrand  ran  he  factored  using  the  same  steps  shown  in  Eq.  (  IS)  to  obtain 

f'(<\U,l)  -  J  duo  ■■  ■  dul,+id(jl>((,(j\l)l‘{uo, - D,I)  (54) 

where  P(t,cr\l)  is  the  joint  prior  probability  for  r  and  a.  Further,  /’(«<), . . . ,  u^+i|c,(T,  D. /)  ran  be 
factored  to  obtain 

P{t\D,l)  =  f  du0...dul/+idar((\l)P{a\I) 

J  (55) 

X  /1(uo,...,u„+i|r,<71/)/:>(D|r,a,wo,.-  ..t/^+i,/) 

where  P(uo,. . . ,  u„+t|<-,fr, /)  is  the  prior  probability  for  all  of  the  pixels  given  t,  a,  and  the  prior 
information  /;  and  it  is  given  by  Eq.  (18),  P(/J|<,<t,u0,.  . .  is  the  likelihood  for  the  data 

and  is  given  by  Eq.  (0),  and  is  the  prior  probability  for  a  and  was  assumed  independent  of 

1.  Substituting  Eq.  (0)  for  the  likelihood,  Eq.  (18),  for  the  prior  probability  for  the  pixels  and  a 
Jeffreys  prior  for  both  1  and  a  one  obtains: 

oc  J  duo...duv+\dcr{ 


X  exp^-; 
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wtu're  the  eigenvalues  ( An, . . . ,  A„+| }  must  now  be  kept,  because  they  are  functions  of  f. 

Id  evaluate  these  u  -f-  4  integrals  (1/  +  2  integrals  over  the  u,,  and  one  over  a)  the  quadratic  in 
the  likelihood  is  expanded  to  obtain  something  very  much  like  Eq.  (52): 


P((\DJ)  cx  J  uj,  d<r(Ao  •••  A^+|]2<r 


—  (e-f  A  + 1 )(  e+ 1 


i/+l  e+1 


x  oxp  1  N(i*  ~ 2  S dxUi  +  YL  Yl9ktukui  (’ 
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where  <m  was  defined  earlier  in  E<|.  (44).  Unlike  vvhat.  was  done  earlier,  here  there  are  i/  +  2  integrals 
over  all  of  the  Thus  no  intermediate  steps  are  involved  where  the  cofartor  of  (ju  was  defined. 
All  that  is  necessary  is  that  the  <m  matrix  be  diagonalized. 

In  the  process  of  doing  these  calculations,  several  matrices  will  have  to  be  diagonalized,  and 
1  lie  procedure's  for  doing  so  are  all  essentially  the  same.  One  introduces  a  new  set  of  integration 
vuiiahles  based  on  the  singular  value  deromposit ion  of  tin"  interaction  matrix,  and  transforms  to 
the  new  variable's.  In  these  variables  all  of  t  he  (iaussiau  quadrature  integrals  separate  and  mnv  le 
don*’  trivially.  Iterau.e  all  id  these  integrations  .oe  very  similar,  the  details  will  be  omit  tr>i  and 


only  tli''  results  of  the  calculations  Riven.  In  tins  case.  after  having  evaluated  the  u  +2  integral.-* 
the  posterior  prohahilitv  for  <  independenl  of  the  pixel  values  is  given  l>v 


where 
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{ Ao . A^+i }  are  the  eigenvalues  of  the  R ,*  matrix  defined  in  Eq.  (29)  and  { A(b  . . . ,  }  and  e/, 

are  the  eigenvalues  and  eigenvectors  of  the  g ,*  matrix  defined  in  Eq.  (34). 

The  remaining  integral  is  very  similar  to  what  was  done  earlier,  Eq.  (52),  when  a  was  removed 
and  again  only  the  results  are  given  here 


f‘((\[)J)  a 
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When  <  —  0,  there  is  effectively  no  prior,  and  the  pixel  estimates  go  to  data  values.  However, 
when  r  —  oo,  the  prior  dominates  and  forces  the  second  derivative  to  zero  and  the  reconstruction 
goes  to  a  constant.  As  c  =  0  the  likelihood  term  (the  term  in  square  brackets  in  Eq.  ((31 )]  is  going 
to  infinity  like  i~N.  However,  the  prior  term  (essentially  (l/+l)  is  going  to  zero  at  exactly  the 
same  time.  Somewhere  between  these  two  extreme  values  there  lies  a  maximum  in  the  posterior 
probability  that  acts  as  a  trade  off  between  the  prior  and  the  likelihood. 

f  igure  1  contains  a  simple  data  set  with  N  =  100  data  values.  The  “signal"  in  these  data  is  a 
constant  of  value  5,  plus  additive  white  noise  of  standard  deviation  1.  Using  the  procedures  derived 
so  far,  the  value  of  the  59’th  pixel  is  to  be  estimated.  As  was  mentioned  earlier,  before  the  value 
of  pixel  us 9  may  be  estimated,  one  must  set  the  value  of  c.  Using  the  posterior  probability  for 
c,  this  may  now  be  done.  1’his  probability  density  function  is  plotted  in  Fig.  3.  This  probability 
distribution  has  a  well  defined  maximum  near  70,  and  a  mean  value  of  approximately  93.  Note 
that  for  values  of  c  smaller  than  10  and  larger  than  270,  the  probability  for  »  is  essentially  zero.  So 
whatever  value  of  €  is  used,  it  should  be  somewhere  in  these  bounds. 

Figure  4  contains  the  posterior  probability  for  ur,9  given  •  -  10,80,93,  and  200.  Note  that 
for  the  maximum  and  mean  (Panels  B  and  C),  the  posterior  probabilities  are  almost  identical. 
However,  when  (  is  too  small  (Panel  A),  the  posterior  probability  is  smeared  out  and  broad;  on  the 
other  hand,  when  <  is  too  large  (Panel  D),  the  posterior  probability  is  too  narrow.  It  is  interesting 
to  note  that  as  <  —  0  the  width  of  the  posterior  probability  becomes  large,  while  the  estimated 
pixel  values  go  to  the  data  values.  Estimating  the  pixels  to  be  equal  to  the  data  is  the  maximum 
likelihood  result.  In  this  limit,  there  is  no  prior  information  about  the  signal.  Probability  theory 
is  warning  you  that  there  is  no  way  to  differentiate  between  the  signal  and  the  noise;  the  signal 
could  be  anything  consistent  with  the  total  moan-square  data  value.  In  the  other  limit.  <  —  x. 
deviation-  from  a  constant  are  not.  allowed.  Essentially  the  results  goes  in  the  mean  t  standard 
deviation  estimate  of  the  constant. 
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figure  .1:  The  Posterior  Probability  for  < 


fie  ■'  'l  lio  posterior  |>rol»al>ility  for  (  was  computed  using  the  constraint  on  the  second  derivative  This 
prohahility  density  function  has  a  well  defined  maximum  with  a  peak  near  <  ~  80,  and  a  mean  value  of  93. 

2  Data  Interpolation  -  General  Prior  Information 

before  proceeding  to  l ho  deconvolution  problem,  the  data  interpolation  problem  will  be  generalized 
to  include  other  types  of  prior  information.  Three  types  of  prior  information  will  be  included: 
information  about  the  functional  form  of  the  signal  and  about  its  first  and  second  derivatives.  As 
was  demonstrated  in  the  previous  section,  what  differentiates  the  results  of  a  probability  theory 
i  alculation  from  a  maximum  likelihood  or  least  squares  calculation  is  the  presence  of  the  prior 
probability.  In  the  previous  section  onlv  prior  information  about  the  second  derivative  was  used, 
here  three  different  types  of  prior  information  will  be  used.  To  utilize  all  of  this  information  there 
are  two  tasks  that  must,  he  completed:  first,  each  of  these  three  pieces  of  information  must  be 
formulated  into  a  prior  probability,  and  second,  these  different  priors  must  be  combined  into  a 
single  prior  which  expresses  all  three  pieces  of  information. 

To  see  how  to  convert  each  of  the  three  types  of  prior  information  into  a  prior  probabilities, 
suppose  the  signal  is  known  to  be  sinusoidal.  The  total  difference  between  the  signal  and  the  data 
is  gi  yen  bv 

N 

y  [u,  -  A  cos(u><,  +  #)].  ( f>2 ) 

i=l 

What,  is  actually  known  about,  this  difference?  Would  one  expect,  this  to  be  zero,  positive,  or 
negative?  If  the  signal  is  known  to  be  more  or  less  sinusoidal,  then  on  average  one  would  expei  t  the 
difference  to  be  small  and  its  value  could  be  either  positive  or  negative.  So  the  prior  information  is 
consistent  with  a  zero  mean  value:  i.e.,  no  information  is  available  that  would  lead  us  to  expect  this 
difference  to  be  either  positive  or  negative  upon  repeating  the  experiment  many  times.  Second,  tlm 
mean  square  difference  is  expected  to  be  nonzero;  i.e.,  we  expect,  some  deviations  from  th«  model. 
Now  i  he  prim  i  pie  nl  m.iMiini  m  ent  ropy  can  lie  used  in  assign  a  ptob.i  iulit  v  density  him  t  ion  :  ■ .  lb: 
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ilili<T«nn'  Mut  nuU*  ili.it  this  difference  is  not  necessarily  noise,  it  merely  reflects  our  uncertainty 
in  the  actual  functional  form  of  the  signal.  When  maximum  entropy  is  applied,  it  will  assign  a 
( l.nissian  prior  to  this  difference.  Because  for  a  fixed  mean-square  the  <  laussian  has  highest  entropy, 
and  is  therefore  the  least  informative  distribution  possible.  From  the  (iaussian  distribution  one  can 
assign  a  prior  probability  for  the  difference  between  the  pixels  and  the  model.  For  this  sinusoidal 
example,  this  probability  density  function  is  given  by 


/’( u,| A,u>.  0,  < ,  <t, /| )  =  a  ^c^exp 
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[u,  —  A  cos(lj),  4-  #)] 
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where  the  parameter  r  measures  the  amount  of  misfit  between  the  pixels  and  the  model.  As 
occurred  in  the  previous  example,  this  prior  has  introduced  a  number  of  additional  parameters:  A 
an  amplitude,  0  a  phase,  u>  a  frequency,  and  a  fractional  variance  (2.  Some  of  these  parameters  may 
lie  known,  but  more  likely  either  they  will  have  to  be  eliminated  from  the  problem,  or  inferences  will 
have  to  be  made  about  them.  For  the  time  being,  no  assumptions  will  be  made,  and  the  problem 
will  be  formulated  in  a  way  that  either  they  may  be  eliminated  as  nuisances  or  inferences  may  be 
made  about  them. 


2. 1  Formulating  The  Prior  Probability 

Three  types  of  prior  information  will  be  included  in  this  generalization  of  the  interpolation  problem: 
information  on  the  functional  form,  and  on  the  first  and  second  derivatives.  These  will  be  labeled 
/ 1,  1 1  and  h  respectively.  The  prior  for  each  of  these  will  be  formulated  separately  and  then 
combined  into  a  single  prior  for  use  in  the  generalized  interpolation  calculation. 

Information  I\  will  be  addressed  first.  This  information  assumes  that  something  is  known 
about  the  functional  form  of  the  signal.  The  functional  form  will  be  written  as  Af\(U),  where  A 
is  an  amplitude  and,  for  example,  /](/,)  might  be  a  sinusoid.  The  total  mean-square  difference  6  2 
between  the  model  and  the  pixels  is  given  by 

£[«.  -  Afi(t;)}2  =  *?•  (64) 

If  A |  (),  the  model  must  follow  the  functional  form  exactly.  If  <)]  --  oe  then  the  total  squared 

difference  goes  to  infinity  and  the  reconstruction  will  follow  the  data. 

Using  information  I\  in  a  maximum  entropy  calculation  results  in  assigning  a  Gaussian  prior 


/’(no . /i)  a  exp 


(65) 


where  <  |  is  the  fractional  variance  associated  with  information  I\.  As  was  noted  earlier,  the  prior 
has  introduced  two  new  parameters:  A,  t).  Last,  note  that  the  prior  lias  not  yet  been  normalized, 
this  will  be  done  after  combining  the  three  priors. 

Information  li  specifies  how  the  first  derivative  is  to  behave.  Assuming  the  functional  form  of 
the  first  derivative  is  given  bv  /j(  / )  then 


("t+l  -  "i  I  ■'  H  fAlA}2  -  ^2  (66) 

I  - 1 

v.  here  H  i''  an  amplitude.  .iikI  I'  /  is  the  total  squared  difference.  I  sine  this  as  a  const  taint  in  a 
maximum  •■ntropv  (  ah  u  hit  ion  allows  us  to  assign  a  prior  probability  to  tie'  difference  betwcci,  lie 
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modeled  di-ri vat.ive  ;hhI  the  pixels: 
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where  two  additional  parameters,  B  the  amplitude,  and  < \  the  fractional  variance  have  been  intro¬ 
duced. 

Information  / 3  specifies  how  the  second  derivative  is  to  behave.  Assuming  the  functional  form 
of  the  second  derivative  is  given  by  C fi(tx),  one  has 


y>,+i + “1-1  -  ^M,  -  ch(ti)]2  =  63  (os) 

1=1 


where  (',  is  an  amplitude  associated  with  the  second  derivative,  /3(f)  is  its  functional  form,  and  b2 
is  the  total  squared  difference.  Repeating  the  maximum  entropy  calculation  gives 


(  C 

/’( ui . u„|C\uo,u„+i,<:;j,cr,/3)  oc  exp  j  ^[m,+i  +  'i,_1  _  ^u»  ~  ^/3(<«)j 
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as  the  prior  probability  for  the  pixels  given  information  l-\,  where  C  is  the  amplitude,  and  12  is  the 
associated  fractional  variance. 

Note  that  three  unknown  amplitudes  A ,  B ,  and  three  fractional  variances  <f,  < 2  and  C3,  and 
two  boundary  pixels  uo  have  entered  the  problem.  The  three  amplitudes  and  all  of  the  unknown 
pixels  will  be  eliminated  from  the  problem  as  nuisance  parameters.  In  this  problem  it  is  critically 
important  to  ensure  that  proper  priors  are  used.  A  proper  prior  is  one  which  is  normalizable. 
Improper  priors  are  ones  which  cannot  be  normalized.  Strictly  speaking  a  function  that  cannot  be 
normalized  is  not  a  probability  density  function.  Two  examples  of  improper  priors  are  the  Jeffreys 
prior  and  the  uniform  prior.  The  Jeffreys  prior  is  improper  when  the  limits  on  the  parameter  are 
taken  from  zero  to  infinity.  The  uniform  prior  is  improper  whenever  one  of  the  limits  is  taken  to 
infinity.  In  spite  of  this  the  use  of  improper  or  unnormalizable  prior  probabilities  in  parameter 
estimation  is  often  convenient  and  harmless.  However,  in  this  problem  the  use  of  improper  priors 
must  be  avoided  because  the  normalization  factor  associated  with  the  prior  does  not  always  cancel. 
Consequently  a  normalized  prior  must  be  used  for  A ,  B  and  C  as  well  as  for  all  of  the  pixels.  These 
parameters  are  location  parameters,  and  the  prior  which  correctly  express  information  about  a 
location  parameter  is  a  Gaussian.  Consequently,  the  prior  for  the  three  amplitudes  A.  B.  and  < ' 
will  be  taken  as 


P(A,B,C\,nJnU])  =  (2<ra2rV2'oexP 
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where  /  --  was  made  to  differentiate  l  from  / j,  /?,  and  / 3.  This  prior  says  that  the  three 

amplitudes  may  be  either  positive  or  negative  and  we  do  not  know  which  it  is.  If  <0  is  small, 
then  this  prior  does  not  express  a  strong  opinion  about  the  amplitudes,  other  than  small  absolute 
magnitude  is  preferred.  It  will  be  assumed  that  <0  is  set  from  prior  information  and  that  its  actual 
value  is  known.  So  long  as  this  value  is  small,  the  only  purpose  served  by  the  prior  is  to  prevent 
anv  singular  mathematics  from  occurring  because.  So  whatever  value  is  assigned  to  »0,  it  will  not 
change  the  results,  provided  it  <C  a. 

lust  information  l\.  1 2  and  /,  ran  be  used  to  constrain  the  interior  pixels,  they  may  also  be 
,,sed  td  (.ntstrain  the  houndaiv  pixels.  Information  l\  specified  a  fun*  t  ional  foim  lor  t  fie  -ign.d 
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I  here  is  no  reason  that  /)(/,)  cannot  be  evaluated  at  the  boundary.  This  will  give 


/*(««!«  i, /l)  «  exp  J  -  Af\{U))Y 


/'(Mi/+lkl,/l)  OC  exp  {  --1 -r[wt/+i  -  Afi(iv+i)f 


as  the  prior  probabilities  for  the  lower  and  upper  boundary  pixels. 

Information  /2  specified  prior  information  about  the  first  derivative.  When  the  first  derivative 
was  defined,  asymmetric  difference  equation  was  written,  Eq.  (12).  This  symmetric  difference  can¬ 
not  be  used  on  the  boundary  because  it  would  introduce  still  more  unknown  parameters.  However, 
a  forward  and  backward  first  derivative  can  be  used.  For  uo  one  has 

B(u0\(2,^,h)  <x  exp  |-^2[ut  -  M0  -  fl/2(fo)]2|  •  (73) 

Similarly  a  backward  first  derivative  may  be  defined  and  used  to  formulate  a  prior  for  ul/+  j.  This 
prior  is  given  by 


7'(«i/fi|»2,cT,/2)  a  exp  i -  u„  -  Bft(t,.+  \ )]' 


Hut  just  as  occurred  earlier,  care  must  be  taken  here  because  these  boundary  conditions  are  not 
enough  to  make  the  prior  associated  with  /2  normalizable.  So  when  this  prior  is  actually  pro¬ 
grammed  the  constraints  will  have  to  be  modified  just  enough  to  make  the  matrices  associated 
with  them  nonsingular. 

The  last  information  / 1  may  be  used  to  specify  a  prior  on  the  two  boundary  pixels.  However,  now 
we  have  a  functional  form  for  the  second  derivative.  The  second  derivative  cannot  be  interpreted 
as  specifying  that  neighboring  pixels  be  approximately  equal.  That  interpretation  was  possible 
because  the  prior  information  assumed  adjacent  pixels  were  approximately  equal.  Here  we  could 
specify  a  forward  and  backward  second  derivative,  but  that  will  not  work  because  the  forward 
second  derivative  at  to  is  the  same  as  the  symmetric  second  derivative  at  time  <1,  consequently  we 
would  be  trying  to  constrain  the  same  quantity  to  two  different  values.  Without  doing  something 
much  more  complicated,  there  is  no  easy  wav  to  constrain  the  boundary  pixels  using  the  functional 
form  of  the  second  derivative.  This  is  not  a  problem,  because  I\  and  /2  have  already  supplied  more 
than  enough  information  to  form  a  normalizable  prior  for  the  boundary  pixels. 


2.2  Combining  Different  Prior  Information 

From  th<'  previous  subsection  there  are  ten  probability  density  functions  expressing  prior  infor¬ 
mation  about  the  pixels  (no, . . . ,  u„+i)  and  the  amplitudes.  What  is  needed  is  a  single  prior  that 
express  the  information  contained  in  all  ten  of  these  priors.  The  process  of  combining  these  priors 
is  begun  by  adopting  some  new  notation. 

There  are  three  amplitudes,  two  boundary  pixels,  and  v  interior  pixels.  There  are  //  +  •'>  total 
parameters  (excluding  the  three  fractional  variances).  All  of  them  but  one  are  to  be  eliminated  as 
nuisances.  To  facilitate  this,  the  pixel  values  \uo . (<e-M }  and  the  three  amplitudes  A,  B  and  (  ' 


will  be  taken  as  a  rollertion  V.  The  elements.  r,,  are  defined  as: 


u,  if  ft  £  i  <  v  +  1, 

A  if  i  =  //  +  '2, 

v,  = 

B  if  i  =  v  +  3, 

C  if  i  =  v  +  4. 

In  this  notation  the  posterior  probability  for  the  jth  pixel  is  given  by 
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where  the  word  “pixel”  will  be  used  to  refer  to  all  of  the  V,  including  the  three  amplitudes  A, 
B  and  ('.  The  three  regularizes  have  been  added  to  the  probability  density  function  in  a  wav 
that  indicates  their  value  are  known.  As  was  done  earlier,  if  there  values  are  not  actually  known 
probability  theory  will  be  used  either  estimate  them  or  remove  them. 

In  this  problem  it  will  be  assumed  that  the  prior  information  is  independent  and  the  prior 
probability  for  the  pixels  given  the  total  prior  information  is  just  the  product  of  the  probabilities 
given  the  individual  pieces  of  information: 

f  ( 1  I* 0*  1  \  i  ^2 >  Ar*  /0)d )  -  T( uq,  . . . ,  ?v+ 1 ! 7 ] ,  /0|j ) P( no,  •  -  - 1  iV-y 1 1 ('i '  ^old )  ( 77 ) 

X  P(vo, . .  . ,  lV+l|  Ali  ^0|d  )f>(vi'+2i  -^old )’ 

where  /’( vq, . . . ,  v„+\\1\,  /0|(| )  specifies  the  prior  probability  for  the  interior  and  boundary  pixels 

given  information  7j,  P(v 0,  . . ,  iv+i  |/2,  ^0ld )  sPpf‘fies  the  prior  given  /2,  . tv+i | /^,  /0|j ) 

specifies  the  prior  given  /j,  and  1V+3,  r„-H|ro,  /Q]d)  specifies  the  prior  information  for  the 

three  amplitudes.  The  independence  assumption  was  used  to  factor  the  prior  in  this  particular 
fashion. 

I  he  first  three  terms  are  all  of  the  form  P(V\(  1 ,  /] ,  /0|d).  These  may  be  factored  into  a  lower 
boundary  prior,  and  interior  prior  and  an  outer  boundary  prior: 


P(  V7 1<  1 ,  /1 ,  /0|(j )  =  /’( eo|<  1 ,  /1 ,  /0|d)/’(  »’| . v„\t  1 , 1 1 ,  /Djd )  P(  »v+i  In ,  I] ,  /()|(l  )•  ( 7«) 

These  three  priors  were  given  in  the  previous  section.  This  process  may  be  repeated  for  information 
/2  and  /  |  with  similar  results.  The  remaining  term,  /’( v„+2,  v„+3,  tv+4|<  u.  /0|,| ),  was  given  by 
Eq.  (70). 

For  information  l\,  the  prior  probability  is  given  by  Eq.  (65) 

P(V\f o,n,<r. /l-  /0ld )  a  0XP 

where  the  prior  information  for  .1 ,  Eq.  (70),  was  included  in  this  equation.  This  may  be  rewritten 
in  matrix  form  as 

(  I  1/+4  e-n  'I 
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Similarly  for  information  / 2  the  prior  probability  becomes 


(2  2  u 

~  ^2^y^;,+1  ~  ,  '~1  ~ 


X  exp  |-^j[n|  -  t>0  ~  ^/2(fo)]2| 
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And  in  matrix  form  this  prior  is  given  by 


(  2  e-H  /V  -4-1  1 
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The  matrix  AT/  is  given  b.v 
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will  !'!'  i  U  ■  ,v  i  4)  ;ui(l 

W(i  -  h(to)  +  fi(U  h 
#1  --  /2(h)  ~  fiiU)), 

Mi  =  /2(<i+i)  -  h(ti-i ).  (2<i<i/-l) 

Bn  —  fzi^u+l)  ~  /2(^-l)i 
Bv+\  -  ~[f2(ti/)  +  /2(<v+l  )}, 

#i/+t  ~  ~§  +  f2(fo)1  +  ‘^y~]/2 (<i)2  ~h  /2O./V+I  )2- 
'2  (=f 

Last,  for  information  /3,  the  prior  probability  for  the  pixels  is  given  by 

^(V/ko,<3,<r,/.-5,/0|d)  a  exp  j-^jC2  -  ^  X2^'+1  +  V'-1  _  2v*  ~  ^ /3(<«)]2 1  - 
which  can  also  bo  written  as 
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“2^2  X]  y^Yk‘vkvi 
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with  the  matrix  V7*./  defined  as 
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where  (0  <  A:,  l  <  v  +  4)  and 
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f  inally,  combining  these  throe  priors,  one  obtains  the  probability  for  the  pixels  given  /  |  j .  I\ , 
/„>,  and  ly 


(I  1/+-4  e+4  1 

Y  Y  (<\Wh  +  (jXicl  +  <lYkl)  VkVl  l, 

“a  Jt=o  /— o  v  '  J 


(91) 


where  /,,  /2,  /.i,/0u  —  /• 

Mefore  returning  to  the  main  problem,  this  prior  must  be  normalized.  To  do  this  the  matrix 
Zkl  is  defined: 


%kl  =  ( ]  Wkl  +  <  2  W  + 

and  the  fully  normalized  prior  probability  for  the  pixels  is  given  by 
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(92) 


(93) 


whe  re  { An,  •  ■  ■ ,  A„+4}  is  the  product  of  the  eigenvalues  of  the  Z}k  matrix. 

The  prior,  Eq.  (93),  may  now  be  inserted  into  the  posterior  probability  for  the  jth  pixel,  Eq.  (76), 

to  obtain:  _ 

P{vj\i0,<i,f2,t3,<r,  D,I)  oc  j  ■  dvx  •  •  ■  rr~(l/+5) ^ 
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where’  a  factor  of  (27r)  2  was  dropped. 


2.3  Eliminating  Ntiisance  Parameters 

To  obtain  the  posterior  probability  for  Vj,  all  of  the  parameters  exce>pt  Vj  must  be  removed  bv 
inte'gratiem.  There  are  ;/  -f  4  integrals  that  meist  be  evaluated.  These’  integrals  are  very  similar 
lo  those'  e’valuaieel  in  the  previous  section  and  fe’w  of  the  details  will  be>  given.  To  evaluate  these 
integrals,  the  exponent  in  the  likelihood,  E«j.  (91),  is  squared  to  obtain: 


/’( I’jpo,  e  I, <2, <3,  a,  DJ)  oc  I  ■  ■  dv,  ■  •  -  [All  •  ■  •  A^^cr  feV+e+5) 
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by  D 


(95) 


where  d1  is  the  mean  square  of  the-  elata,  Eq.  (33),  and  the  interaction  matrix,  </u,  is  given  by 


Ukl  -  Xu  +  ■*>«  0  A  k,l  <  f  +  4  (96) 

where  ,Sj(  was  eb’fined  earlier.  Hep  (35). 

Note' t  hat  t  lie  ( iki  matrix  has  been  redefined.  In  fact  that,  is  not  <|uite  true',  it  has  been  generalized. 
As  we  proceed  though  this  1  denial  ion.  each  section  will  generalize  the  re-outs  from  tin’  previous 


sections  Whenever  possible,  these  generalized  equations  will  use  the  same  notation  to  represent 
the  generalized  quantity. 

The  integrals  are  over  all  of  the  V,  except  Vj.  Pixel  v}  behaves  like  a  constant  and  must  be 
handled  in  a  special  manner.  Separating  from  the  integration  variables  one  has 
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where  z  was  defined  earlier,  Eq.  (37). 
Evaluating  the  v  +  4  integrals  gives 
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as  the  posterior  probability  for  the  jth  pixel  with 
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(98) 
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by  ii 
't) 


h(v})  ■  h(vj)  =  V/i,(n;)2  (ijtj)  (100) 

1=0 

where  e/t  is  the  ith  component  of  the  /th  eigenvector  of  the  jth  cofactor  of  Eq.  (96)  and  A(  is  the 
rth  eigenvalue  of  this  matrix. 

If  the  variance  of  the  noise  and  the  regularizes  are  actually  known  then  the  problem  is  completed 
and  Eq.  I  98)  represents  the  best  estimate  of  the  jth  pixel  one  can  make  given  the  three  types  of 
prior  information.  However,  in  general  a  and  c|,...,cs,  are  not  known  and  must  be  determined 
from  the  data. 


2.4  Eliminating  a  As  A  Nuisance  Parameter 


The  posterior  probability  for  v}  independent  of  a  is  computed  in  a  way  analogous  to  what  was 
done  in  subsection  1.3.  The  details  of  the  calculation  will  not  be  repeated  here.  However,  as 
a  reminder,  one  must  assign  a  prior  probability  to  the  standard  deviation  (here  this  is  a  Jeffreys 
prior),  and  integrate  with  respect  to  a  over  its  valid  range  of  values.  Note  that  we  cautioned  against 
using  improper  priors  in  this  escalation  and  this  is  essential  for  location  parameters.  However, 
for  the  scale  parameters  (the  fractional  variances,  ami  a)  use  of  improper  priors  is  harmless.  This 
distribution  is  given  by 


P(vj |'o, <  l .  ^2- «  a-  P-  f)  * 


h(v})  ■  h(v})  +  ldjv}z  -  fi}]i 

NT2 


(101) 


T  hi  s  is  a  Student  /distribution  and  this  result  will  be  applied  in  a  numerical  example,  but  before 
it  can  used.  <o,  <  t ,  <2.  and  <  t  must  either  be  known  or  be  estimated. 


■J.r>  Kslimatiiig  The  l<<^ularizers 

I  lie  joint  posterior  probability  for  < ),  <2,  and  <3  will  !><•  computed  and  used  to  set  the  regularizes. 
In  this  calculation  <o  will  be  assumed  known.  This  parameter  relates  to  the  prior  uncertainty  about 
the  amplitudes  A ,  li,  and  It  will  be  assumed  that  the  experiment  was  designed  in  such  a  way 
that  is  adequate  to  actually  rapture  the  data  in  question.  This  implies  that  one  knows  the  strength 
of  the  signal,  at  least  to  order  of  magnitude  and  this  information  was  used  in  setting  to-  However, 
the  other  three  parameters,  <j,  r2>  and  f  3  relate  to  how  important  the  prior  information  is  compared 
to  the  data  and  this  is  probably  not  known  before  actually  taking  the  data.  Inferences  will  have  to 
be  made  about  these  parameters. 

To  make  inferences  about  these  parameters,  one  uses  the  rules  of  probability  theory  to  eliminate 
the  nuisance  parameters  from  the  problem.  Here  the  standard  deviation,  cr,  will  be  removed  from 
the  posterior  probability,  and  then  the  rules  of  probability  theory  will  be  used  to  make  inferences 
about  the  three  regularizes  This  calculation  is  again  essentially  identical  to  what  was  done  in 
subsection  1.4  and  the  details  of  the  calculation  will  not  be  given.  To  proceed  a  prior  for  a,  e j ,  t2 
and  f 3  must  be  assigned.  Here  a  Jeffreys  priors  will  be  used  for  the  prior  probability  for  all  of  the 
regularization  parameters;  one  obtains 
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•  h(c i , c2, C3)1 
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as  the  joint  posterior  probability  for  q,  <2,  and  <3,  where 
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/»(<!. «2i«3)  ‘  =  ^M(l><2,<3)2  (104) 
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. K+i)  and  ri,  are  the  eigenvalues  and  eigenvectors  of  Isq.  (9(i). 

To  illustrate  the  use  of  the  joint  posterior  probability,  the  example  begun  in  the  previous 

subsection  will  be  continued.  For  simplicity  only  prior  information  about  the  functional  form  ol 
the  signal  will  be  used  in  this  example.  The  data  in  this  example  are  the  same  data  used  in  Fig.  1. 
These  data  has  been  repeated  in  Fig.  f>.  The  solid  line  in  Fig.  5  is  the  estimate  of  all  of  the  pixel 
values  when  the  maximum  of  the  posterior  probability  is  used  as  the  estimate.  The  dashed  lines 
are  the  estimated  uncertainty  in  the  pixel  values  in  the  (mean  ±  standard  deviation)  sense.  These 
estimates  assumed  the  value  of  the  regularizer  was  known. 

To  set  the  regularizer,  the  posterior  probability  for  C)  was  computed.  This  is  given  by  Fig.  b(A). 
Note  that  this  posterior  probability  density  function  has  a  well  defined  maximum  near  .1.  If  one 

computes  the  mean  value  of  c (,  one  finds  (rj)  :  7.21.  It  is  this  mean  value  for  cj  that  was  used 

to  compute  the  estimates  shown  in  Fig.  5  as  the  solid  line.  Note  that  the  estimated  signal  is  flat 
and  only  very  small  deviations  are  observed  from  a  constant  value.  Also  note  that,  the  estimates 
overlap  the  true  value  of  the  constant,  5. 

Next,  the  posterior  probability  for  ti.yj  was  computed  given  that  <  1  =  1,  see  Fig.  6(H).  This  value 
is  relatively  far  from  the  value  indicated  by  probability  theory.  Note  that  the  probability  for  the 
pixel  is  broad  and  smeared  out,  indicating  that  iiy ,  has  not  been  well  estimated.  Hut  also  note  that 
true  value  of  the  pixel  is  covered  bv  this  posterioi!  Panel  {(’)  of  Fig.  li  is  the  posterior  probability 
lor  nvi  given  th.it  <1  7.21.  Here,  the  posterior  1  much  sharper,  and  the  pixel  is  better  r<  •  •  <  >1  ■  •  i . 
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Fig  5  The  functional  form  of  the  signal  was  used  in  the  prior  probability.  The  maximum  of  the  posterior 
probability  for  the  pixels  with  c  “  7.21  is  given  by  the  solid  line.  The  one-standard  deviation  width  of  the 
posterior  is  shown  as  the  dotted  lines.  The  data  (open  circles)  are  shown  for  reference. 

It  is  possible  to  estimate  the  variance  of  the  noise  when  <i  =  1  and  <i  =  7.21.  When  t]  -  1  the 
variance  of  the  noise  is  estimated  to  be  small:  (a)  =  0.77.  When  cj  =  7.21  it  is  estimated  to  be 
(a)  =  0.99.  When  the  posterior  probability  for  the  pixels  is  computed,  one  finds  that  the  estimate 
with  the  largest  estimated  noise  level  has  a  better  determination  of  the  pixels. 

Last,  note  that  the  '  j-standard  deviation  error  bars  shown  in  Fig.  5  are  much  narrower  than 
those  shown  in  Fig.  1,  indicating  that  the  constraint  on  the  functional  form  was  much  more  infor¬ 
mative  than  the  constraint  on  the  second  derivative.  But  in  both  cases  the  estimates  easily  overlap 
the  true  answers  at  one  standard  deviation. 


3  Deconvolution 


Now-  that  the  data  interpolation  problem  has  b"en  thoroughly  addressed,  we  are  in  a  position  to 
proceed  to  the  full  deconvolution  nroblem.  Fortunately,  the  preceding  sections  have  essentially 
solved  the  deconvolution  problem.  As  a  reminder,  in  the  deconvolution  problem  there  is  a  data  set 
D  that  is  composed  of  a  signal  plus  additive  noise: 

d,  =  J  <lTr[t,  -  t)u(t)  +  ?i,  i  =  {1,/?  +  1,2/iJ  +  I . i/}  (105) 

where  r(t,  —  r)  is  the  impulse  response  function.  On  a  discrete  grid,  r  takes  on  values  only  at  the 
discrete  times  r,  and  this  equation  is  written 
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T,}  =  r[l,  -  r})&T  (107) 

ami  Ar  is  the  time  interval  between  pixel  values. 

The  calculation  for  the  posterior  probability  for  the  pixel  values  proceeds  just  as  in  the  previous 
sections.  T  he  posterior  probability  for  pixel  is  given  by 
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where  the  prior  P(V\I)  will  be  taken  as  Fq.  (93).  Introduction  of  the  convolution  operation  only 
complicates  the  direct  probability  or  likelihood,  not  the  prior. 

Squaring  the  likelihood  and  substituting  Eq.  (93)  for  the  prior,  one  obtains 
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where  l)k  is  a  kind  of  weighted  averaged  over  the  data,  and  is  defined  as 
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the  <7*1  matrix  generalizes  to 
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3.1  Eliminating  Nuisance  Parameters 

As  observed  in  subsection  1.3,  the  pixel  being  estimated,  v} ,  behaves  as  if  it  were  a  constant  in  the 
integrals  and  must  be  treated  specially.  This  is  done  by  separating  from  the  integration  variables 
to  obtain: 


/’(  ';|<0,<,.c2.<3.<t.  D.  1)  x 


x 


(/V+k+5) 


exp 


1 

'la1 


v  44  1/4-4 


E  E 

k  =11  /-II 

k *  I  I?) 


9klvkvl 


Nd 2  -  IDjVj  +  HjjVj 


i=u 

i  i  ) 


(113) 


30 


Evaluating  th«*  u  f  I  integrals  gives 
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as  the  posterior  probability  for  the  jth  pixel,  where 
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r/,  is  the  tth  component  of  the  /th  eigenvector  of  the  jth  cofactor  of  Eq.  (Ill),  and  A(  is  its  ith 
eigenvalue. 


H.2  Eliminating  <t  As  A  Nuisance  Parameter 

Computing  the  posterior  probability  for  Uj  independent  of  cr  is  essentially  identical  to  what  was 
done  in  subsection  1.4  and  the  details  of  this  calculation  will  not  be  given  here.  The  posterior 
probability  for  jth  pixel  value  is  given  by 


h(vj)  ~  h(vj)  +  'ZDjV)  ~  gjjVj 
nJ1  "  ~ 


(117) 


This  is  a  Student  /  distribution  and  it  is  this  result  that  is  applied  in  the  numerical  examples.  But 
before  any  numerical  calculation  may  be  done  <o  <d,  <2i  and  El  must  either  be  known  or  estimated. 


Estimating  The  Regularizes 

As  was  done  previously.  <(i  will  be  assumed  known  and  inferences  about  C|,  (2>  and  <3  will  be  made, 
lo  proceed,  a  prior  for  rr,  (|,  and  f 3  must  be  assigned.  Here  .leffreys  priors  will  be  used  for 
all  ol  the  parameters.  The  prior  probability  for  the  pixels  was  already  assigned,  Eq.  (93).  and  the 
probability  for  the  data  is  given  bv  Eq.  (9).  Using  these,  one  obtains  the  joint  posterior  probability 
for  <  1 ,  r 2  and  iy. 
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where  the  eigenvalues.  (  \ii . A, ,33},  are  the  eigenvalues  ,if  Eq.  (9'J1.  and  11,  [from  Eq.  ('id  vva.> 

1  ept.iceil  bv  c,  to  ioiif"Mn  to  the  current  notation.  Evaluating  all  ol  the  integrals  and  dropping 
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some  irrelevant  constants,  one  obtains: 
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as  the  joint  posterior  probability  for  the  three  fractional  variances,  where 

j  e+4 


Mm, <2. <3)  -  —7=  T  D,c i„ 


(120) 
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Ml,<2,<3)  •  Mi,42.M)  =  ^Mm,«2,C|)2,  (121) 

f=0 

(Af, . }  are  the  eigenvalues  of  the  gy  matrix,  Eq.  (Ill),  and  e/,  is  the  tth  component  of  the 

/th  eigenvector  of  this  matrix. 


3.4  Examples  -  Deconvolution 

To  illustrate  this  calculation,  several  deconvolution  examples  will  be  given  which  incorporate  differ 
ent  types  of  prior  information.  In  the  first  example,  very  little  prior  information  will  be  available; 
all  that  will  be  used  is  a  constraint  on  the  smoothness  of  the  function.  In  the  second  example,  more 
prior  information  will  be  available,  and  the  functional  form  of  the  signal  will  be  used  to  constrain 
the  deconvolution.  In  the  third  example,  both  sets  of  prior  information  will  be  used  to  constrain 
the  deconvolution.  The  data  will  be  simulated  sinusoidal  data  that  have  been  low-pass  filtered. 
This  problem  is  important  in  radar  target  identification,  because  it  is  the  free  space  signal  that 
must  be  known  in  the  target  identification  problem. 

The  signal  function  will  be  taken  to  be  a  pure  sinusoid  of  known  frequency  and  phase: 


u,  =  10cos(0.3<j). 


(122) 


However,  this  signal  has  been  filtered  using  a  low-pass  filter: 


M«) 


(123) 


where  the  constant  c  is  given  by 


=  X> 


-0.251, 


i=l 


(124) 


and  the  times  t,  were  taken  to  be  0, 1, . . . ,  N  —  1.  Note  that  the  smearing  function  is  defined  to 
be  zero  for  times  less  than  f]  or  greater  than  t,\.  The  data  are  a  convolution  between  the  signal 
function.  u(f).  and  the  impulse  response  function  r(t): 


,v 

d,  =  ^lOcos(O.3t;)e-"2r,!<1“bl0(fl  -  t})/c  +  v„ 
; -i 


(125) 


where  v,  represents  noise  of  unit  standard  deviation,  and  0(tt  -  t })  is  the  unit  step  function. 

The  filter  changes  the  amplitude  of  the  response.  Consequently,  the  time  domain  signal  to  noise 
ratio  I-  not  |U;  rather  it  is  more  like  5.  A  plot  ot  tfie  impulse  response  Inn,  turn  lor  I  mu.  is  shown 
in  fig.  7 1  A  ).  |)ata  item  1(1(1  is  .,  weighted  average  of  all  ol  the  preceding  signal  values.  A  s  you  go 


Figure  7:  Deconvolution  The  Data 


Time  (Response  Centered  at  100) 


>R  7  llx'  filter  response  function  par  I  (A),  mixes  the  components  of  the  sinusoid.  The  mixing  takes 
I, ire  .is  a  weighted  average  Here  all  values  from  I  100  backward  in  time  are  averaged  to  produce  the 
input  from  the  filter  The  data,  open  circles  panel  (H).  remain  a  sinusoid  hut  shifted  in  time  and  with 
rcreased  amplitude  lie  piohlem  is  in  recover  the  original  signal  solid  line  in  panel  (B). 


bark  in  tim*\  the  signal  values  become  less  and  less  important  in  this  average,  finally  dropping  to 
essentially  zero  after  20  sampling  intervals.  The  data  values  (convolved  signal  p  noise)  are  shown 
in  fig.  7iH)  as  the  open  circles.  The  true  signal  is  shown  a.s  the  solid  line  m  panel  7(H).  The 
convolution  introduces  an  effective  amplitude  change  and  phase  shift,  while  the  “noise"  introduces 
uncertainty  about  the  "true"  convolved  signal. 

I'o  remove  the  effect  of  the  convolution,  a  constraint  on  the  second  derivative  will  be  used.  To 
apply  the  posterior  probability  for  the  pixels,  one  must  first  set  the  value  of  the  regularizes  This 
is  done  by  computing  the  posterior  probability  for  the  regularizer  given  the  data  and  the  prior 
information.  IV q.  (119).  This  is  shown  in  F'ig.  X(A).  As  in  the  previous  examples,  this  probability 
density  function  has  a  well  defined  maximum  near  <3  as  0.8.  This  maximum  value  was  used  in 
computing  the  posterior  probability  for  the  pixel  values,  Eq.  (117).  The  maximum  of  the  posterior 
probability  for  the  pixels  (solid  line)  plus  or  minus  one  standard  deviation  (dashed  line)  is  shown 
in  8(H).  The  true  signal  values  are  shown  as  the  plus  signs.  Notice  the  true  signal  is  covered  almost 
everywhere  at  one  standard  deviation.  Also  note  that  there  is  a  systematic  misfit  in  the  peaks 
and  valleys.  That  is  because  the  prior  information  tries  to  make  the  second  derivative  a.s  small 
as  possible.  At  these  turning  points  the  second  derivative  is  at  its  maximum,  so  of  course  the 
reconstruction  will  undershoot  the  mark  here.  Last,  note  that  the  reconstruction  is  bad  near  time 
t  =  1 00.  Hut  probability  theory  knows  th.s  and  has  widened  the  error  bars,  so  that,  the  true  value 
is  still  overlapped  at  two  standard  nations. 

in  the  second  part  of  this  ex  •  tiple,  use  of  the  correct  functional  form  of  the  signal  will  be 
investigated.  Here  it  will  1  :  ..timed  that  the  signal  must  be  a  cosine  with  the  known  correct 
frequency.  The  posterior  probability  for  the  regularizer,  Eq.  (119),  is  shown  in  Fig.  9(A).  Again 
there  is  a  peak  near  ( 1  s  0.25.  This  value  of  cj  was  then  used  to  compute  the  posterior  probability 
for  each  of  the  pixels,  Eq.  (117).  The  maximum  of  the  posterior  probability  for  each  pixel  is  shown 
in  Fig  9(H)  as  the  solid  line.  The  one  standard  deviation  error  bars  are  shown  as  the  dashed  lines. 
The  true  signal  is  shown  as  the  plus  signs.  Note  that  the  reconstruction  follows  the  signal  much 
more  closely;  The  true  signal  is  easily  covered  by  the  one-standard-deviation  error  bars.  However, 
unlike  the  previous  example  this  reconstruction  does  not  know  about  the  “smoothness”  of  the 
function  so  the  reconstruction  is  jagged,  even  though  it  actually  fits  the  data  better.  T  his  suggest 
that  these  two  pieces  of  prior  information  could  be  combined,  and  this  reconstruction  would  he 
better  than  either  separately. 

Hepeating  this  example  using  both  the  second  derivative  constraint  and  the  functional  form  of 
tl  ■  signal  is  more  difficult  because  now  there  are  two  regularizers:  f]  the  regularizer  associated 
< ,’ith  the  functional  form,  and  <  $  associated  with  the  second  derivative  constraint.  As  in  the  other 
examples,  to  compute  the  posterior  probability  for  a  pixel,  we  must  set  these  regularizers.  This 
is  done  by  computing  the  joint  posterior  probability  for  the  regularizers,  Eq.  (119).  and  then 
marginalizing  over  either  rj  or  «3-  In  Fig.  10  the  joint  posterior  probability  for  these  two  regularizers 
has  been  plotted.  The  dashed  contours  are  the  base  10  logarithm  of  P(i  \ ,  < a | r 0 .  D.l).  The  region 
enclosed  hv  the  contour  labeled  ft  contains  99%  of  the  total  probability.  The  region  enclosed  by  the 
contour  labeled  8  contains  99 .9%,  of  the  total  probability,  etc.  The  solid  lines  inside  of  contour  9  is 
the  fully  normalized  joint  posterior  probability. 

From  this  joint,  posterior  probability  for  ci  and  <3,  it  is  possible  to  compute  the  posterior  prob¬ 
ability  for  either  f]  or  > 3  bv  using  the  sum  rule  from  probability  theory.  The  posterior  probability 
for  <  1  is  given  by 

P{<  ]\>n.  IK  I)  ~  ( d< a /’(r |  ,<3! to.  D.  / )  (  1 29 ) 


it 


Fit;  8.  In  panel  (A)  the  posterior  probability  for  the  regularizer  (3  is  shown.  As  in  previous  examples  any 
value  of  (3  rlose  to  the  maximum  yields  essentially  identical  deconvolutions.  Panel  (B)  shows  the  peak  value 
nf  each  est  imated  pixel  value  (solid  line)  plus  or  minus  one  standard  deviation  (dashed  line).  The  true  values 
are  shown  as  the  plus  signs 
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Fir  11  From  the  joint  posterior  probability  for  and  f.i  (Fig.  10),  one  can  easily  compute  the  posterior 
probability  for  each  f i ,  panel  (A),  and  <3,  panel  (B).  Using  the  maxima  from  these  marginal  distributions, 
a  ±  standard  deviation  estimate  for  the  pixel  values  were  computed.  The  maxima  are  shown  in  ((')  as  the 
solid  line  the  one-standard-deviation  error  bars  are  shown  as  the  dashed  lines,  and  the  true  signal  values 
are  Riven  by  the  plus  signs. 


(127) 


:< n< I  I  lie  posterior  proba In! i I  v  lor  i  i  is  given  bv 

/’(mI'ik  IK  I)  =  J  <l<  1 1'(<  i,c.i|<o,  D.  /). 

These  two  probability  density  functions  have  been  plotted  in  Fig  11(A)  and  (B)  respectively.  The 
peak  value  tor  <  \  is  approximately  0.35  and  for  n  it  is  approximately  0.25.  These  values  were  used 
to  compute  the  posterior  probability  for  the  pixels.  The  peak  values  are  shown  in  (C)  as  the  solid 
line,  the  one-standard deviation  error  bars  are  shown  as  the  d.ashed  lines,  and  the  true  signal  values 
are  given  by  the  plus  signs.  Note  that  this  reconstruction  has  combined  the  best  features  of  the 
two  previous  examples:  The  use  of  information  about  the  functional  form  causes  the  reconstruction 
to  follow  the  true  signal  much  more  closely,  while  use  of  the  smoothing  constraint  has  suppressed 
much  of  the  random  fluctuation. 


4  Deconvolution  -  Generalizations 


The  results  of  the  preceding  calculations  can  be  generalized  in  a  number  of  ways  by  allowing  more 
general  types  of  prior  information.  When  the  priors  were  established  for  the  deconvolution  problem, 
only  one  function  per  type  of  prior  information  was  allowed.  There  is  no  reason  why  more  functions 
cannot  be  allowed,  and  in  many  cases  the  need  for  them  is  obvious.  For  example,  suppose  f\  (the 
functional  form)  were  a  cosine,  then  a  second  function,  a  sine,  is  needed  to  properly  express  the 
phase  of  the  sinusoid.  Additionally,  only  three  pieces  of  prior  information  were  used:  one  on  the 
functional  form  of  the  signal,  one  on  its  first  derivative,  and  one  on  its  second  derivative.  There 
is  no  reason  why  one  could  not  have  more  then  three  pieces  of  prior  information,  and  these  could 
constrain  more  complicated  functions  of  the  pixels  than  just  the  first  and  second  derivatives 

In  this  section,  the  deconvolution  results  presented  in  the  previous  sections  will  be  generalized  to 
allow  for  any  number  of  pieces  of  prior  information.  This  information  can  specify  functional  forms 
containing  any  number  of  amplitudes  and  functions,  and  these  functions  will  be  allowed  to  constrain 
an  arbitrary  linear  combination  of  the  pixels.  The  total  number  of  pieces  of  prior  information  will 
be  designated  as  r.  Facb  piece  of  prior  information  will  be  designated  as  /),...,  /r.  For  information 
/,,,  the  constraint  will  be  written 
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where  «{*  is  a  known  matrix  of  coefficients  that  describe  how  the  pixels  interact.  For  example,  ii 
could  describe  the  second  derivative  constraint  used  earlier.  The  coefficients  are  the  amplitudes 
or  intensities  of  the  signal  functions,  and  they  will  be  considered  as  unknown,  nuisance  parameters. 
The  constraint  functions  f[‘  are  the  analogue  of  the  functions  and  fo)  used  earlier.  However, 

there  are  m/(  of  these  functions  for  each  of  the  r  constraints.  There  are  a  total  of  £ mn  functions 
and  amplitudes.  Kacli  constraint  will  have  a  fractional  variance  or  regularizer  associated  with  it. 
These  regnlarizers  will  be  designated  as  <i,...,fr.  Last,  note  that  the  sum  over  discrete  times  (the 
i  index)  runs  from  0  <  i  <  v  +  1.  So  the  above  constraints  are  written  implicitly  to  include  the 
boundary  conditions. 

( 'onverting  the  /it  h  constraint  into  a  probability  density  function  for  t  he  pixels  is  straightforward 
and  results  in 
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where  the  first  term  expresses  the  prior  information  about  the  amplitudes  and  the  second  expresses 
t fie  prior  information  about  the  pixels.  As  in  the  previous  examples,  this  prior  may  be  converted 
into  a  prior  with  a  double  sum:  this  gives 
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and  (77+  1)  is  the  total  number  of  unknown  generalized  pixels  vt.  Following  what  was  done  earlier, 
these  generalized  pixels  are  defined  as: 
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l  A\ - ,„r  if  v  +  I  +  mi  +  •  •  •  +  mr_ \  <  i  <  t). 


Last  the  matrix  W fa  is  defined  as 


Wfi  =  K,  -  di  +  <%. 


(133) 


where  c£,  and  d correspond  to  the  coefficients  of  the  terms  obtained  by  squaring  the  exponent, 
combining  all  of  the  terms  and  carrying  out  the  sum  over  i.  The  matrix  is  just  the  coefficient 
of  the  first  term  of  the  square  in  Eq.  (129),  and  is  given  by 
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Note  that  in  setting  up  the  general  Wfa  matrix,  the  indices  are  allowed  to  take  on  values  0  <  fc,l  <  tj, 
so  in  the  definition  of  6^  it  was  necessary  to  state  explicitly  that  this  term  is  zero  when  either  k 
or  l  was  greater  than  v  +  1.  The  matrix  c*{  corresponds  to  the  coefficient  of  the  cross  term,  and  is 
given  by 
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where  Ski  is  the  Kronerker  delta  function.  As  was  done  previously,  the  individual  priors  may  be 
combined  to  obtain  a  single  prior  which  expresses  all  of  the  prior  information.  This  prior  is  given 
by 


P(V\t . . (T,I)=  y/Ao---A,,  (2f<r2)  *+  exP 5Z 


where 
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and  { Ao  -  -  -  A,,}  are  the  eigenvalues  of  the  Zkl  matrix. 

fhe  mathematics  from  the  three  previous  sections  may  now  be  repeated  to  obtain  a  generalized 
result.  First,  the  posterior  probability  for  the  jth  generalized  pixel  is  given  by 
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where  Dj  was  defined  earlier,  Eq.  (110), 
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and  A[,  •  •  •  A'f  are  (he  eigenvalues  of  the  jth  cofactor  of  the  gu  matrix.  The  <)kl  matrix  is  defined  as 

9U=Zki+S\i  (142) 


where  Ski  was  defined  in  Eq.  (112). 

Next,  the  posterior  probability  for  the  jth  pixel  value,  independent  of  the 
is  given  by 
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variance  of  the  noise. 
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fast,  the  joint  marginal  posterior  probability  for  the  regularizes  is  given  bv 
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where  A',  ■  •  A';  are  the  eigenvalues  of  the  g^  matrix,  Eq.  (14‘2). 
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where  in  Eqs.  ( 144-146),  the  eigenvalues  {Ag, - A'^}  are  the  eigenvalues  of  the  g}k  matrix,  Eq.  (142), 

and  r,,  are  its  eigenvectors. 

Note  that  care  must  be  taken  when  interpreting  the  results  of  these  calculations,  because  the 
notation  for  the  eigenvalues  and  eigenvectors  has  not  been  changed  when  different  matrices  were 
used.  The  meaning  should  remain  clear  because  when  each  formula  is  given  the  matrices  being 
diagonalized  are  clearly  stated.  But  just  to  be  clear  on  this  point,  when  the  posterior  probability 
for  the  j th  pixel  is  being  computed  the  eigenvalues  A'0,...,Anu'  and  eigenvectors  ei}  refer  to  the 
jth  cofactor  of  the  g} jt  matrix.  However,  when  the  posterior  probability  for  the  regularizes  is 
computed  A^  . . .  A'„  refer  to  the  eigenvalues  ot  the  g}k  matrix  (not  thejth  cofactor)  and  ejk  refer  to 
its  eigenvectors  of  g}k. 


4.1  Estimating  The  Pixel  Values 

It  is  one  thing  to  formally  derive  a  result  and  quite  another  for  it  to  he  useful.  The  posterior 
probability  for  the  individual  pixels  given  all  of  the  prior  information,  Eq.  (143),  is  one  of  these 
types  of  results.  While  this  result  will  prove  useful  in  examining  individual,  important,  pixels  it  is 
not  the  way  to  estimate  all  of  the  pixels.  Even  if  one  were  to  compute  this  posterior  probability 
density  for  all  of  the  pixels,  it  still  would  not  give  one  an  estimated  signal;  rather  it  would  tell 
one  what  is  actually  known  about  the  signal  values  and  the  uncertainty  in  those  values.  What  is 
actually  needed  is  an  estimate  of  the  pixels  and  the  uncertainty  in  the  estimate. 

There  are  many  ways  to  estimate  a  parameter  using  probability  theory  and  the  estimate  of 
choice  will  depend  on  what  one  stands  to  lose  if  one  is  wrong.  Two  different  types  of  estimates 
are  the  maximum  of  the  posterior  probability,  and  mean  or  expected  value  of  a  parameter.  In 
this  calculation,  the  expected  value  and  peak  values  are  the  same.  Thus  an  the  expected  value 
and  standard  deviation  estimate  for  the  pixels  is  readily  available  and  will  be  used  as  the  pixel 
estimates. 

The  expected  value  of  the  jth  generalized  pixel  is  given  by 

(tt;|<0 - - # r i cr ,  / )  =  J dt;0  . .  .dv,,VjP(vQ,.. .  • .  ,<r ,<r,  D,/)  (147) 

where  the  notation  . 'r.cr,  /)  means  the  expected  value  of  pixel  given  that  fn,...,cr, 

and  er  are  known.  But  note  that  it  is  the  fully  normalized  joint  probability  density  function  that  is 
to  be  used.  Consequently,  when  this  calculation  is  performed  the  probability  density  function  will 
have  tf>  be  normalized: 
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Evaluating  tin’  integrals.  one  obtains 
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Similarly  the  expected  mean-square  value  of  the  pixels  is  given  by 
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and  one  finds 

If 

(152) 
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From  which  one  would  make  a  (mean  ±  standard  deviation)  estimate  of 


t=0 


Jk 


(153) 


Note  that  while  the  individual  probability  distributions  would  require  one  to  invert  a  matrix  for 
each  value  of  vj\  the  (mean  ±  standard  deviation)  estimate  may  be  done  for  all  of  the  pixels  with 
a  single  matrix  inversion. 


4.2  Estimating  The  Noise  Level 

Before  the  above  result  ran  be  used,  (a2)  must  be  computed.  To  compute  (rr2),  one  must  evaluate 


{( t *}  =  <r2P(a\io,--,€T,DJ)dn 


154) 


where  /*( cr jc '  "dn  0,1)  is  the  fully  normalized  posterior  probability  for  a  given  the  regularizes 
and  the  data.  But  using  the  rules  of  probability  theory,  this  is  just  the  prior  probability  for  o  times 
the  probability  for  the  regularizers  given  a.  So  the  expectation  value  may  be  written  as: 


a2)  -  a'2daP{a\l)P(<  ],■  ■  ■  ,ir\f0,<r,  DJ). 
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and  /»(<  i , . . .  ,fr)  ■  h(< \ . tr)  is  given  by  Eq.  ( 145).  The  normalization  constant  needed  to  ensure 

t  hiit  the  total  probability  is  one  is  given  by 
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Making  the  appropriate  substitutions  and  evaluating  the  integrals  gives 
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as  the  e.-timated  standard  deviation  for  the  noise. 

At  i  his  point  in  the  calculation  it  would  appear  that  another  numerical  example  is  needed 
to  illustrate  these  new  additional  calculations  and  generalizations.  However  that  is  not  the  case, 
because  all  of  the  examples  given  in  the  text  were  computed  by  using  these  final  results.  That  is  to 
say,  all  of  the  computer  programs  used  in  the  numerical  calculations  implemented  this  generalized 
calculation.  To  produce  anv  specific  example  the  model  functions  and  the  pixel  smearing  matrices 
were  changed  to  produce  the  desired  calculation. 


5  Summary  And  Conclusions 

Proceeding  through  stages,  this  paper  has  explored  the  deconvolution  problem  in  varying  degrees  of 
complexity.  In  the  first  two  sections,  the  deconvolution  problem  was  simplified  to  the  interpolation 
problem.  This  problem  was  then  explored  to  see  how  varying  the  prior  information  affects  the 
results  of  the  calculation.  These  calculations  illustrate  that  the  interpolation  problem  is  easily 
solved  bv  incorporating  prior  information  into  the  problem.  The  more  cogent  the  prior  information 
the  better  the  reconstructions.  However,  even  with  very  crude  prior  information  probability  theory 
does  not  lie.  The  interpolations  always  covered  the  torrect  signal  at  one  and  sometimes  two  standard 
deviations. 

After  obtaining  an  understanding  of  the  interpolation  problem,  the  calculation  was  then  gener¬ 
alized  to  include  the  convolution.  Including  the  convolution  did  not  actually  change  the  results  from 
the  first  two  sections,  it  only  generalized  them.  The  effect  of  prior  information  was  then  explored 
again  to  show  how  including  different  types  of  prior  information  affects  the  results.  Again  the 
results  were  essentially  identical  to  what  was  found  in  the  first  two  sections:  Including  more  cogent 
prior  information  helps  the  deconvolution  problem;  but  again  when  only  limited  prior  information 
is  available,  the  results  obtained  overlapped  the  correct  result  at  one  and  sometimes  two  standard 
deviations. 

Last,  the  entire  formalism  was  generalized  to  include  much  more  arbitrary  types  of  prior  in¬ 
formation.  This  formalism,  given  in  the  preceding  section,  is  the  only  version  of  the  calculation 
programmed  on  the  computer.  Every  example  given  in  this  work  was  essentially  an  example  of  the 
power  of  the  general  calculation  presented  in  the  previous  section. 

This  work  represents  at  best,  a  first  initial  exploration  of  the  deconvolution  problem.  Much 
remains  to  be  done.  For  example  this  work  did  not  address  the  use  of  priors  outside  of  the  class 
of  general  Gaussian  priors.  While  this  class  is  wide,  it  does  not  include  such  priors  as  the  entropy 
prior.  An  interesting  problem  would  be  to  try  to  combine  the  best  aspects  of  both  the  entropy  prior 
and  the  Gaussian  priors  used  in  this  calculation.  Indeed  there  is  some  evidence  based  on  work  in 
other  fields  that  this  could  he  very  productive,  [28]. 

Last,  this  work  suggests  how  to  use  probability  theory  to  solve  other  types  of  outstanding 
problems.  In  particular  relatively  straightforward  modification  to  this  calculation  will  allow  inho¬ 
mogeneous  linear  differential  equations  with  either  boundary  value,  initial  value,  or  any  other  type 
of  asymptotic  condition  to  be  solved.  Additionally,  using  the  techniques  developed  in  this  paper, 
the  moment  problem,  i.e.,  inferring  a  function  from  a  finite  number  of  its  moments,  should  now  tie 
a  solvable  problem.  The  onlv  change  in  this  rah  illation  is  that  the  limit  as  the  noise  variance  goes 
to  zero  is  needed  to  solve  this  problem. 

If  there  is  a  single  major  accomplishment  for  this  paper,  it  was  to  demonstrate  that  the  results 
one  obtains  depends  critically  on  t  he  prior  information  put  into  the  problem.  To  put.  it  bluntly,  t  here 
is  no  sui  b  thing  as  a  single  best  deconvolution.  Every  result  from  a  Hayesian  calculation  is  only  as 
good  a.  the  prior  information  t  fi.it  goes  into  it.  However,  every  Hayesian  tabulation  carrm  with  it 
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■  i  up  ill  ilic  n  iiiiti  ;n  n  I  v  ill  tin'  i  ill  i*  u  I  it  t  i  i  ii  i .  While  Minn'  priors  will  give  poor  rcconstrui  I  ions, 

probability  theory  warns  mu- of  this  by  making  I  In-  uncertainty  in  tin1  i-stimates  large  (largo  enough 
to  mver  the  correct  value  of  the  signal).  So  even  the  results  from  very  uninformative  priors  still 
give  meaningful  reconstructions. 
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